Model Selection

Low-resource speech processing

# Low-resource speech processing

Whisper Small Ta

This model is a speech recognition model fine-tuned on the Tamil Common Voice 17.0 dataset based on OpenAI's Whisper Small, with a Word Error Rate (WER) of 43.23%.

Speech Recognition

Transformers Other

Whisper Fa Tinyyy

Persian automatic speech recognition model fine-tuned based on OpenAI Whisper-tiny, trained on the common_voice_11_0 dataset

Speech Recognition

Transformers Other

Arabic Alphabet Speech Classification

This is a transformers model for Arabic alphabet speech classification, capable of recognizing and classifying the pronunciation of Arabic letters.

Audio Classification

Whisper Large V3 Taiwanese Hakka

A Whisper-large-v3 fine-tuned model for Taiwanese Hakka speech recognition, supporting multiple Hakka dialects

Speech Recognition

Transformers Other

Vegam Whisper Medium Ml

This is a version of thennal/whisper-medium-ml converted to the CTranslate2 model format for Malayalam speech recognition

Speech Recognition Other

Exp W2v2t Th Hubert S533

A Thai speech recognition model fine-tuned from facebook/hubert-large-ll60k, trained on data from Common Voice 7.0

Speech Recognition

Transformers Other

Ai Light Dance Stepmania Ft Wav2vec2 Large Xlsr 53 V3

Automatic speech recognition model based on wav2vec2-large-xlsr-53, fine-tuned on the GARY109/AI_LIGHT_DANCE dataset

Speech Recognition

Asr Wav2vec2 Dvoice Amharic

This is an automatic speech recognition model for Amharic, trained using wav2vec 2.0 architecture with CTC/Attention mechanism

Speech Recognition Other

This model is a speech recognition model fine-tuned on an unknown dataset based on facebook/wav2vec2-large-xlsr-53, supporting recognition of Arabic dialects (Arabizi).

Speech Recognition

Wav2vec2 Large 100h Lv60 Self

Wav2Vec2-Large-100h-Lv60 is a large model pre-trained and fine-tuned on 100 hours of Libri-Light and Librispeech speech data, trained with self-training objectives, suitable for speech recognition tasks with 16kHz sampling rate.

Speech Recognition

Transformers English

Wav2vec2 Common Voice Tr Demo

This model is a speech recognition model fine-tuned on the Turkish Common Voice dataset based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Turkish

A speech recognition model fine-tuned on the Turkish Common Voice dataset based on facebook/wav2vec2-large-xlsr-53

Speech Recognition Other

Wav2vec2 Large Xlsr Arabic Demo Colab

An Arabic speech recognition model fine-tuned on the Common Voice dataset based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Wav2vec2 Large Xlsr 53 Hungarian

This is a Hungarian automatic speech recognition model fine-tuned from the facebook/wav2vec2-large-xlsr-53 model, trained using the Common Voice dataset.

Speech Recognition Other

Fb Youtube Vi Large

This model is an automatic speech recognition model fine-tuned on Vietnamese YouTube informal audio datasets, based on facebook/wav2vec2-large-xlsr-53.

Speech Recognition

A Vietnamese automatic speech recognition model fine-tuned on the PHONGDTD/VINDATAVLSP - NA dataset based on microsoft/wavlm-base-plus

Speech Recognition

Wav2vec2 Large Xlsr Tamil Commonvoice

This model is a speech recognition model fine-tuned on the Common Voice Tamil dataset based on facebook/wav2vec2-large-xlsr-53

Speech Recognition

Wav2vec2 Large Xlsr 53 Sw

Swahili automatic speech recognition model fine-tuned on XLSR-53 large model, supports 16kHz sampling rate audio input

Speech Recognition Other

W2v Timit Ft 4001

A speech recognition model based on Wav2Vec 2.0 architecture, fine-tuned on the TIMIT dataset, suitable for English speech-to-text tasks

Speech Recognition

Wav2vec2 Large Xlsr Finnish

This is an automatic speech recognition model fine-tuned on Finnish based on facebook/wav2vec2-large-xlsr-53, trained using the Common Voice dataset.

Speech Recognition Other

Wav2vec2 Base Timit Demo Colab

A speech recognition model fine-tuned on the common_voice dataset based on anas/wav2vec2-large-xlsr-arabic

Speech Recognition

Unispeech 1350 En 168 Es Ft 1h

UniSpeech is a unified speech representation learning model that combines labeled and unlabeled data for pre-training, specifically fine-tuned for Spanish phoneme recognition.

Speech Recognition

Transformers Spanish

Wav2vec2 Base 10k Voxpopuli Ft Cs

A speech recognition model based on Facebook's Wav2Vec2 architecture, pre-trained with 10K unlabeled Czech data from the VoxPopuli corpus and fine-tuned on Czech transcription data.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Upper Sorbian Mixed

This is an Upper Sorbian speech recognition model fine-tuned from facebook/wav2vec2-large-xlsr-53, trained on data from the Common Voice dataset and online Sorbian courses.

Speech Recognition Other

This model is an automatic speech recognition model fine-tuned on the Common Voice 7.0 AB dataset, based on the XLS-R dummy architecture

Speech Recognition

Transformers Other

Wav2vec2 Xls R 300m W2V2 XLSR 300M YAKUT SMALL

This is a speech recognition model fine-tuned on the Yakut (Sakha) language dataset based on the facebook/wav2vec2-xls-r-300m model

Speech Recognition

Transformers Other

Arabic Speech Recognition

An Arabic automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, supporting 16kHz sampling rate audio input

Speech Recognition Arabic

Wav2vec2 Xls R 300m Lg

This model is a fine-tuned speech recognition model based on facebook/wav2vec2-xls-r-300m on the COMMON_VOICE - LG dataset, supporting automatic speech recognition tasks for Luganda (lg).

Speech Recognition

Transformers Other

Sew D Small 100k Ft Timit

An automatic speech recognition model fine-tuned on the TIMIT_ASR dataset based on asapp/sew-d-small-100k

Speech Recognition

patrickvonplaten

Wav2vec2 Large Xls Ar

An Arabic automatic speech recognition model fine-tuned based on facebook/wav2vec2-large-xlsr-53, achieving a WER of 52% on the Common Voice Arabic dataset.

Speech Recognition

Transformers Arabic

Wav2vec2 Large Xls R 300m My Hindi Home Colab

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on a general speech dataset, suitable for speech recognition tasks.

Speech Recognition

Wav2vec2 Base 10k Voxpopuli Ft Sk

Pre-trained on 10K hours of unlabeled VoxPopuli corpus data and fine-tuned on Slovak transcription data

Speech Recognition

Transformers Other

Wav2vec2 Base 10k 8khz Pt Cv7 2

This model is a Portuguese automatic speech recognition model based on the wav2vec2 architecture, fine-tuned on the Common Voice 7 dataset, supporting 8kHz sample rate audio input.

Speech Recognition

Transformers Other

Wav2vec2 Large Xlsr Turkish Demo Colab

This model is a fine-tuned Turkish speech recognition model based on facebook/wav2vec2-large-xlsr-53 on the Common Voice dataset

Speech Recognition

patrickvonplaten

Wav2vec2 Large Xls R 300m Ab V4

This is an automatic speech recognition model fine-tuned on the Abkhazian (ab) dataset based on Facebook's wav2vec2-xls-r-300m model

Speech Recognition

Transformers Other

This is an automatic speech recognition model fine-tuned on the COMMON_VOICE - AB dataset, based on the XLS-R Dummy architecture

Speech Recognition

Transformers Other

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase